Create a fully automated YouTube video with Text-to-speech services & comparison of Amazon Polly vs. IBM Watson
Intro.
Text to speech (TTS) is a popular area in machine learning. As technology evolves, the option of TTS has increased drastically. In recent years, cloud computing companies have improved TTS with the growth of big data and artificial intelligence applications. I will compare 2 TTS that I used to create AWS tutorials on YouTube. The tutorials can be found in this playlist.
Nowadays, big cloud computing companies provide APIs for speech recognition services makes it easy to use. Contrary to open sources TTS services, TTS APIs provided by cloud computing companies ensures that personal data remains within the user account. I will share my experience with Amazon Polly and IBM Watson here in this article. Note that I have used the Demo version of IBM Watson and no personal data is involved.
Link to IBM Watson Text to Speech:
https://www.ibm.com/cloud/watson-text-to-speech
Link to Amazon Polly:
Other free TTS service:
https://text-to-speech.imtranslator.net/
1. IBM Watsons
The first 3 videos were created with IBM Watsons Text to Speech service. This is the link to the Demo version I used to create the tutorial videos.
https://text-to-speech-demo.ng.bluemix.net/
Features:
- 14 languages & variations — 27 voices (13 neural and 14 standard) across 7 languages
- Lite plan gives you 500 Minutes per month free
- Standard plan starting from $0.02USD/Minute
Pro
- Doesn’t require to create an account
- Source code can be forked from GitHub
Con
- Cannot resolve abbreviations such as AWS, IAM. Work Around, type “A” “W” “S” to force IBM to spell out each alphabet
- The downloaded file doesn’t come with a file extension, thus, require to append “.mp3” to each “synthesize” file manually
An example of speech using IBM Watson can be found in this video: